Dissecting Population Substructure in India via Correlation Optimization of Genetics and Geodemographics
نویسندگان
چکیده
India represents an intricate tapestry of population sub-structure shaped by geography, language, culture and social stratification operating in concert [1–3]. To date, no study has attempted to model and evaluate how these evolutionary forces have interacted to shape the patterns of genetic diversity within India. Geography has been shown to closely correlate with genetic structure in other parts of the world [4,5]. However, the strict endogamy imposed by the Indian caste system, and the large number of spoken languages add further levels of complexity. We merged all publicly available data from the Indian subcontinent into a dataset of 835 individuals across 48,373 SNPs from 84 well-defined groups [2, 6–9]. Bringing together geography, sociolinguistics and genetics, we developed COGG (Correlation Optimization of Genetics and Geodemographics) in order to build a model that optimally explains the observed population genetic sub-structure. We find that shared language rather than geography or social structure has been the most powerful force in creating paths of gene flow within India. Further investigating the origins of Indian substructure, we create population genetic networks across Eurasia. We observe two major corridors towards mainland India; one through the Northwestern and another through the Northeastern frontier with the Uygur population acting as a bridge across the two routes. Importantly, network, ADMIXTURE analysis and f3 statistics support a far northern path connecting Europe to Siberia and gene flow from Siberia and Mongolia towards Central Asia and India. The genetic structure of human populations reflects gene flow around and through geographic, linguistic, cultural, and social barriers. We set out to explore how the complex interplay of these factors may shape the patterns of genetic variation focusing on India, a country of intriguing levels of population structure complexity. The caste system in India has been documented since 1500-1000 BC and imposes strict rules of endogamy over the past several thousands of years. Social stratification within India may be summarised into the so-called Forward Castes and the Backward Castes [10], while 8.2% of the total population belongs to Scheduled Tribes and represents minorities that lie outside the caste system, still largely based on hunting, gathering and unorganized agriculture, with no written form of language [11]. Furthermore, there are 22 official languages within India, also following a distinctive geographic spread. The Dravidian (DR) speaking groups inhabit southern India, Indo-European (IE) speakers inhabit primarily northern India (but also parts of west and east India as well) and Tibeto-Burman (TB) speakers are mostly confined to northeastern India. The numerically small group of Austro-Asiatic (AA) speakers, who are exclusively tribal and are thought to be the original inhabitants of mainland India, inhabit fragmented geographical areas of eastern and central India. Previous studies have
منابع مشابه
Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset
Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainla...
متن کاملIdentification of potential traits and selection criteria for yield improvement in sesame (Sesamum indicum L.) genotypes under rainfed conditions
Sesame is an important oilseed crop in India. To determine potential traits and selection criteria for yield improvement, ninety sesame (Sesamum indicum L.) genotypes were studied in a randomized complete block design with three replications. The data collected on 13 characters were subjected to three different analyses. In variability analysis, high heritability was accompanied by a high genet...
متن کاملUsing PCR-PIRA based genotyping for identifying complex vertebral malformation allele in Frieswal young bulls in India
Complex vertebral malformation (CVM) has considerable economic impact on dairy cattle breeding due to extensive use of artificial insemination (AI). Identifying the carrier is an important factor to reduce the incidence of the genetic disorder. The study was conducted to identify the carriers of CVM in Frieswal cattle by polymerase chain reaction-primer-introduced restriction analysis (PCR-PIRA...
متن کاملThe Estimation of Body Weight from Body Measurements in Kilakarsal Sheep of Tamil Nadu, India
Data on body weight and body measurements (body length, height at withers, chest girth and paunch girth) of adult Kilakarsal sheep have been collected from 124 adult animals managed at the farmers filed in Tirunelveli districts of Tami Nadu, India to estimate the body weight from body measurements. The data were subjected to standard statistical analysis using SPSS software and linear regressio...
متن کاملGenetic polymorphism and association of kappa-casein gene with milk production traits among Frieswal (HF × Sahiwal) cross breed of Indian origin
The aim of the present study was to screen the genotype profile of bovine kappa-casein gene among Frieswal (HF × Sahiwal) crossbred cattle developed in India. A total number of two hundred Frieswal cows were evaluated for HinfI RFLP based genotyping of kappa-casein gene. We observed that only two genotypes (AA and AB) exist among the studied population with the genotype frequency of 0.58 (n=117...
متن کامل